First published: 2020-06-20. Last updated: 2020-06-21.
This post describes the technical design of Podscribe. For the rational and vision for Podscribe, see https://site.j-henderson.com/blog/Podscribe:%20Vision.
GitHub repo is here: https://github.com/jrhender/Podscribe
My planned architecture for Podscribe consists of the following components. I am planning on using Google Cloud for much of it due to their "always free usage limits" being relatively generous.
High level system design of Podscribe
An SPA, probably React. In addition to the UI and calling services, the web front end will handle downloading and hashing podcast files.
API Gateway to handle auth, logging, monitoring. I am planning on using API Endpoints.
Currently planning on simple REST service. This may need to be modified, depending on how long the actual transcription takes. Possible future extensions could include an API to list all available transcripts for a given podcast or the ability to stitch together sections of a podcast.
transcribe(audioFile, fileHash, podcastName=None, podcastEpisode=None, startTime=None, endTime=None)
: Transcribes an audio file and stores the resulting transcript. If the file has already been transcribed, return the transcript directly. This endpoint will call the external Speech-to-Text API (which is relatively costly 💸) and so only authenticated users (possibly with valid payment details) can use.
Parameters:
audio file (binary): The audio file to transcribe. Encoded as application/x-www-form-urlencoded
or multipart/form-data
fileHash (string): The hash of the file to use as the key for transcript storage
podcastName (string): Optional
podcastEpisode (string): Optional
startTime (string): Optional. The start time within the episode of the file in question. Could be useful when retrieve a partial transcript and for stitching transcripts together.
endTime (string): Optional. The end time within the episode of the file in question.
Returns:
(string) Transcript of audio file.
getTranscript(fileHash) OR getTranscript(podcastName, podcastEpisode)
: Retrieves a transcript. This API can be made publicly accessible.
Parameters:
fileHash (string): The hash of the file to use as the key for transcript storage. This endpoint
podcastName (string): Name of podcast
podcastEpisode (string): Episode of podcast
Returns:
(string) Transcript if available or a "not yet transcribed" message.
I'm planning on using a NoSQL database of sort, probably a document database. Maybe MongoDB, maybe FireStore. Each transcript and associated metadata will be a document.
Used to find podcast RSS feed and download information.
getRSS(podcastName)
: Retrieves RSS. This API can be made publicly accessible. There almost certainly already an API for this somewhere (maybe ListenNotes.com or Apple Lookup API)
Parameters:
podcastName (string): Name of podcast
Returns:
(string) RSS feed URL
Probably don't need this until it gets to being more than me using PodScribe, but will probably use Auth0. Auth0 is supported by Google Cloud Endpoints.